Rich Source-Side Context for Statistical Machine Translation

نویسندگان

  • Kevin Gimpel
  • Noah A. Smith
چکیده

We explore the augmentation of statistical machine translation models with features of the context of each phrase to be translated. This work extends several existing threads of research in statistical MT, including the use of context in example-based machine translation (Carl and Way, 2003) and the incorporation of word sense disambiguation into a translation model (Chan et al., 2007). The context features we consider use surrounding words and part-of-speech tags, local syntactic structure, and other properties of the source language sentence to help predict each phrase’s translation. Our approach requires very little computation beyond the standard phrase extraction algorithm and scales well to large data scenarios. We report significant improvements in automatic evaluation scores for Chineseto-English and English-to-German translation, and also describe our entry in the WMT08 shared task based on this approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Source Context in Statistical Machine Translation

Current methods for statistical machine translation typically utilize only a limited context in the input sentence. Many language phenomena thus remain out of their reach, for example long-distance agreement in morphologically rich languages or lexical selection often require information from the whole source sentence. In this work, we present an overview of approaches for including wider conte...

متن کامل

An Empirical Analysis of Source Context Features for Phrase-Based Statistical Machine Translation

Statistical phrase-based machine translation systems make only little use of context information: while the language model takes into account target side context, context information on the source side is typically not integrated into phrase-based translation systems. Translational features such as phrase translation probabilities are learned from phrase-translation pairs extracted from word-al...

متن کامل

Target-Side Context for Discriminative Models in Statistical Machine Translation

Discriminative translation models utilizing source context have been shown to help statistical machine translation performance. We propose a novel extension of this work using target context information. Surprisingly, we show that this model can be efficiently integrated directly in the decoding process. Our approach scales to large training data sizes and results in consistent improvements in ...

متن کامل

Statistical Machine Translation of English – Manipuri using Morpho-syntactic and Semantic Information

English-Manipuri language pair is one of the rarely investigated with restricted bilingual resources. The development of a factored Statistical Machine Translation (SMT) system between English as source and Manipuri, a morphologically rich language as target is reported. The role of the suffixes and dependency relations on the source side and case markers on the target side are identified as im...

متن کامل

Enriching machine-mediated speech-to-speech translation using contextual information

Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved unde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008